The maximum number of parameters resident per GPU before releasing. Smaller values use less memory, but slow down training.